<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Network partitions</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="rep.html" title="Chapter 13.  Berkeley DB Replication" />
    <link rel="prev" href="rep_twosite.html" title="Special considerations for two-site replication groups" />
    <link rel="next" href="rep_faq.html" title="Replication FAQ" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Network partitions</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 13.  Berkeley DB Replication </th>
          <td width="20%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="rep_partition"></a>Network partitions</h2>
          </div>
        </div>
      </div>
      <p>
        The Berkeley DB replication implementation can be affected
        by network partitioning problems.
    </p>
      <p>
        For example, consider a replication group with N members.
        The network partitions with the master on one side and more
        than N/2 of the sites on the other side. The sites on the side
        with the master will continue forward, and the master will
        continue to accept write queries for the databases.
        Unfortunately, the sites on the other side of the partition,
        realizing they no longer have a master, will hold an election.
        The election will succeed as there are more than N/2 of the
        total sites participating, and there will then be two masters
        for the replication group. Since both masters are potentially
        accepting write queries, the databases could diverge in
        incompatible ways.
    </p>
      <p>
        If multiple masters are ever found to exist in a replication
        group, a master detecting the problem will return
        <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. Replication Manager applications
        automatically handle duplicate master situations. If a Base
        API application sees this return, it should reconfigure itself
        as a client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>), and then call for an
        election (by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>). The site that wins the
        election may be one of the two previous masters, or it may be
        another site entirely. Regardless, the winning system will
        bring all of the other systems into conformance.
    </p>
      <p>
        As another example, consider a replication group with a
        master environment and two clients A and B, where client A may
        upgrade to master status and client B cannot. Then, assume
        client A is partitioned from the other two database
        environments, and it becomes out-of-date with respect to the
        master. Then, assume the master crashes and does not come back
        on-line. Subsequently, the network partition is restored, and
        clients A and B hold an election. As client B cannot win the
        election, client A will win by default, and in order to get
        back into sync with client B, possibly committed transactions
        on client B will be unrolled until the two sites can once
        again move forward together.
    </p>
      <p>
        In both of these examples, there is a phase where a newly
        elected master brings the members of a replication group into
        conformance with itself so that it can start sending new
        information to them. This can result in the loss of
        information as previously committed transactions are
        unrolled.
    </p>
      <p>
        In architectures where network partitions are an issue,
        applications may want to implement a heartbeat protocol to
        minimize the consequences of a bad network partition. As long
        as a master is able to contact at least half of the sites in
        the replication group, it is impossible for there to be two
        masters. If the master can no longer contact a sufficient
        number of systems, it should reconfigure itself as a client,
        and hold an election. Replication Manager does not currently
        implement such a feature, so this technique is only available
        to Base API applications.
    </p>
      <p>
        There is another tool applications can use to minimize the
        damage in the case of a network partition. By specifying an
        <span class="bold"><strong>nsites</strong></span> argument to
        <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is larger than the actual number of database
        environments in the replication group, Base API applications
        can keep systems from declaring themselves the master unless
        they can talk to a large percentage of the sites in the
        system. For example, if there are 20 database environments in
        the replication group, and an argument of 30 is specified to
        the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method, then a system will have to be able to
        talk to at least 16 of the sites to declare itself the master.
        Replication Manager automatically maintains the number of
        sites in the replication group, so this technique is only
        available to Base API applications.
    </p>
      <p>
        Specifying a <span class="bold"><strong>nsites</strong></span>
        argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is smaller than the actual number
        of database environments in the replication group has its uses
        as well. For example, consider a replication group with 2
        environments. If they are partitioned from each other, neither
        of the sites could ever get enough votes to become the master.
        A reasonable alternative would be to specify a <span class="bold"><strong>nsites</strong></span> argument of 2 to one of the
        systems and a <span class="bold"><strong>nsites</strong></span> argument
        of 1 to the other. That way, one of the systems could win
        elections even when partitioned, while the other one could
        not. This would allow one of the systems to continue accepting
        write queries after the partition.
    </p>
      <p>
        In a two-site group, Replication Manager by default reacts to
        the loss of communication with the master by observing a
        strict majority rule that prevents the survivor from taking
        over. Thus it avoids multiple masters and the need to unroll
        some transactions if both sites are running but cannot
        communicate. But it does leave the group in a read-only state
        until both sites are available. If application availability
        while one site is down is a priority and it is acceptable to
        risk unrolling some transactions, there is a configuration
        option to turn off the strict majority rule and allow the
        surviving client to declare itself to be master. See the
        <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV-&gt;rep_set_config()</a> method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more
        information.
    </p>
      <p>
        Preferred master mode is another alternative for two-site
        Replication Manager replication groups. It allows the survivor
        to take over after the loss of communication with the master.
        When communications are restored, it always preserves the
        transactions from the preferred master site. See
        <a class="xref" href="rep_twosite.html#twosite_prefmas" title="Preferred master mode">Preferred master mode</a>
        for more information.
    </p>
      <p>
        These scenarios stress the importance of good network
        infrastructure in Berkeley DB replicated environments. When
        replicating database environments over sufficiently lossy
        networking, the best solution may well be to pick a single
        master, and only hold elections when human intervention has
        determined the selected master is unable to recover at
        all.
    </p>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="rep.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Special considerations for two-site replication groups </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Replication FAQ</td>
        </tr>
      </table>
    </div>
  </body>
</html>
